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Introduction 


There  are  many  unsolved  problems  concerning  the  distribution  of 
prime  numbers.  For  example,  it  is  not  known  whether  there  is  an 
infinity  of  'twins',  pairs  p  and  p  +  2  both  prime,  although 
empirical  evidence  strongly  suggests  that  there  is  (see  [1]).  In  this 
paper  the  broader  question  of  the  distribution  of  small  even  gaps 
between  successive  large  primes  is  investigated.  The  arguments  used 
involve  statistical  assumptions  which,  although  intuitively  reasonable, 
are  not,  and  perhaps  can  not  be,  rigorously  justified.  Hence  the  rosul  Is 
obtained  are  not  formally  proven.  They  are,  however,  very  well  supported 
by  extensive  empirical  evidence.  Hence  e  merit  claimed  for  the  results 
of  this  paper  is  that,  theoretically  justifiable  or  not,  they  give  an 
extremely  good  representation  of  the  actual  distribution  of  small  prim*, 
gaps.  Considering  the  irregularities  of  this  distribution  (see  Diagram  l) , 
any  reasonable  explanation  of  it  is  interesting. 
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Notation 


This  section  is  probably  best  referred  to  when  needed  below. 

Throughout  let  Q  be  the  set  of  odd  primes  3,5,7>11> • • •>  and 

let  qeQ  .  Let  N  be  a  large  integer;  p  ,  e  (varying)  prime  with 

p  ~  N  ;  and  r  ,  a  small  positive  integer. 

V  is  the  set  of  all  r-tuples  v  *  (v. ,  ...,v  )  ,  where  each 

l  r 

v.  is  0  or  1  and  v  cl. 
i  r 

For  k  >  1  ,  define 

and,  for  r  >  k  ,  define 

-  (-2  II  (q/(q-l)))kc.c  ...  c 

q£rfl  x  d  K 

Fr,k  *  min(k, q-2) 

n  n  (l-i/((q-l)(q-i)» 

q<r+l  i«l 

For  vcV  ,  let  the  nonzero  components  of  v  be,  in  order, 

vn  »  v  v  (so  n.  «  r)  ,  and  let  n  ■  0  . 

nl  n2  nk  K  ° 

If  L  is  the  set  of  n^  (mod  q)  for  i  «  0,  1,...,  i-1  then 

define 


g(q>i»v) 


(mod  q)  e  L 
otherwise. 
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Finally,  let 


k 

h(v)  =  C  II  (l-g(q,i,v)) 
q<r+l  i=l 


and 


veV  , 


h(v)  . 


The  notation  m  I  n  means  that  n  is  not  divisible  by  m  . 
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Theory 


Everywhere  "the  probability  of  event  E  given  F",  written 
P(e|f)  ,  should  be  interpreted  as  relative  frequency,  in  a  sense 
which  should  be  clear  from  the  context. 

We  are  concerned  with  finding  a  function  f(r)  which  approximates 
the  probability  that  a  prime  gap  in  a  given  region  will  be  of  length 
2r  .  More  precisely,  if  M  is  on  integer,  large  compared  to  r 
and  log  N  ,  but  small  compared  to  N  ,  and  if  there  are  n  +  1 
primes  in  the  interval  (N-M,  N+M)  ,  and  if  m  of  the  gaps  between 
consecutive  primes  in  this  interval  are  of  length  exactly  2r  ,  then 
we  expect  that 


* 


m/n  f  f(r)  . 


The  point  of  this  paper  is  the  substantiation  of: 


Conjecture  1 

Let  A  .  «  F  *S  .  ,  where  F  .  and  S  ,  are  defined 

Ijiv  Tf  K  Tp  K  Tp  K 

above.  Then  for  small  r  ,  i.e.  r  ^  log  N  ,  a  function  f 
satisfying  the  conditions  of  the  previous  paragraph  is 


f(r) 


k»l  (log  N) 


(Table  1  gives  some  computed  values  for  the  A  .) 

r,  K 

Before  discussion  the  Conjecture,  it  is  interesting 
some  of  its  immediate  consequences: 


to  deduce 


k 


Corollary  1 


For  fixed  r  , 


log  N 


(l+o(l))  as  N  -» •  . 


The  proof  is  immediate.  Note  that,  from  the  definition  of 
A  .  „  we  have 


A  =  2c.  II  (^“)  , 
r.l  1  1  'q-2'  ’ 


and  as  II  (^~)  diverges  the  ^  are  unbounded. 

In  the  following,  by  a  ~  b  we  always  mean  that 


lim  a/b  *  1  . 
N  -*• 


Corollary  2 


If  h(r)  is  the  number  of  pairs  of  consecutive  primes  p  and 
p  +  2r  with  p  <  N  ,  then 


T](r)  ~ 


A  .  .  N 
-fr1 

(log  N)2 


Proof 


From  Corollary  1  and  the  prime  number  theorem,  we  see  that 


N  A 


^  ~  l  ^lcgT^  log  t 


and  integration  by  parts  gives  the  result. 
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Corollary  3 

Putting  1  for  r  in  Corollary  2,  the  number  of  twin  primes 
less  than  N  is 

2c1  •  N 

~ - p  • 

(log  N; 

Again  I  would  emphasize  that  Corollaries  1-3,  while  following 
rigorously  from  Conjecture  1,  have  not  been  proven,  for  they  depend 
on  the  informal  arguments  used  below  to  substantiate  (not  prove) 
Conjecture  1. 

Before  discussing  Conjecture  1,  we  need  some  definitions  and  a 
Lemma.  Let  vcV  ,  and  p  range  over  the  primes  near  N  as  before. 
For  r*  <  r  ,  define 

q(r',v)  ■  P(l<i<r'Avi«]pp+2i€Q) 


and 


q(r*,v)  «  P( l<i<r •=>( pf 2icQ»vi-l) )  , 

where  parentheses  may  be  restored  by  the  usual  conventions. 

We  shall  abbreviate  q(r,v)  by  q(v)  and  q(r,v)  by  q(v) . 
Define 


s(v)  b  1  E  Vi  . 

(-D 

t 

If  v  ,  v'cV  we  write  v'  >  v  if  >  v^  for  each  i  ■  1, ...,  r  . 
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We  shall  see  below  that  it  is  possible  to  estimate  q(v)  , 
we  need  to  express  the  function  f  in  terms  of  the  q(v)  .  The 
following  Lemma  does  this: 

Lemma 


f(r)  =  £  s(v)  •  q(v)  . 
veV 


Proof 


From  the  definition  of  q  we  have 


f(r)  i  q((0,0,...,0,l))  , 


(1) 


but  from  the  definition  of  q  it  is  easy  to  see  that 


Hence 


But 


q(v)  «  E  q(v») 
vf>v 


V  s(v)q(v)  *  r  (q(v')  •  T  s(v))  . 
veV  v'cV  v^v' 


(2) 


. -»>-■(:::!) 

(o  if  k'  /  l 
1  if  k'  -  1  * 


Hence  the  result  follows  from  (l)  and  (2). 


so 


Now  we  are  ready  to  complete  the  substantiation  of  Conjecture  1. 
From  the  definition  of  conditional  probability,  we  nee  that 


P(  p+2reQ  |  l<i<K3pf  2nieQ) 


=  P(q€^q<]pq4p+2r|l<i<lPp+2nieQ)  . 


At  this  stage  we  make  an  assumption  which,  although  reasonable, 
is  really  only  justified  by  the  agreement  of  Conjecture  1  with  empirical 
data.  We  assume  independence  of  divisibility  by  the  different  primes 
q  in  the  above  expression.  Actually,  it  is  enough  to  assume  that 
this  is  a  good  approximation  for  primes  q  small  compared  to  p  . 

The  assumption  gives 


£ 


ihlL— 


(3*) 


where 

P^  =  P(q{p+2r|l<i<nc>pf2nieQ)  .  (4) 


We  now  make  a  rather  similar  assumption,  that  the  condition 
p+2nieQ  only  affects  P^  in  that  it  assures  that  qj-p*2n^  .  This 
gives 


Pq  -  P(q|p<-2r|l<i<10qjpf2ni)  ,  (#) 

■  1  "  P(q|p+2r|l<i<30q|pf2ni)  , 
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but  considering  the  possibilities  for  jrt-2r(mod  q)  ,  bearing  in  mind 
that  p  ,  being  prime,  is  not  divisible  by  q  ,  and  looking  back  to 
the  definition  of  g  ,  it  is  not  difficult  to  see  that  the  last  term 
is  just  g(q,k,v)  .  Hence 

pq  =  1  -  g(q,k,v)  .  (5) 

Since  p  is  odd,  the  prime  number  theorem  gives 

logTl  i 

-  p(  qeQ/\q<pDq  |pf2r)  . 

By  another  assumption  similar  to  those  above  this  is 

n  P(q|pf2r)  =  n  (l-l/q)  ,  (*) 

q<p  q<p 
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Observe  that  if  q  >  r  then 


g(q,k,v)  =  l/(q-k)  , 


and  if  q  >  r  +  1  then,  since  k  <  r  ,  this  is  <  1  .  Now  the 
product 


n 

q>k+l 


V  l-i/q 


) 


converges,  and  we  assumed  that  p  ~  N  was  large,  so  in  (7)  the  condition 
q  <  p  may  be  dropped.  Also,  since  q(0,v)  =  1  ,  we  have 


q(r,v) 


q(r.v) 


•  •  • 


qCn^v) 
q(0,v)  ' 


so  from  (7) 


q(r,v) 


=  (-M 

•  viog  r 


k 

n  n 

i=l  qeQ 


^  l-l/q  ^ 


(8) 


Now  substitution  of  (8)  into  the  result  of  the  Lemma,  and  a 
rearrangement  of  the  products  using  the  observation  about  g  above, 
gives  the  required  result.  Steps  where  statistical  assumptions  were 
made  are  indicated  by  (*) . 


Empirical  Tests 


First  it  was  necessary  to  evaluate  the  constants  A  .  .  The 

r,k 

c^  for  k  »  1,  2,...,  40  were  calculated  by  taking  the  product  over 
primes  less  than  40000,  and  roughly  approximating  the  remainder  by 
an  integral.  The  first  few  are  c^  =  0.66016  ,  cQ  =  0.72l6'0  , 
c  =  0.48412  ,  =  0.65085  ,  c,.  =  0.45529  ,  =  0.71?i4  , 

c*j  =  0.62911  ,  Cg  =  0.51704  ,  and  =  0. 34787  .  Computation  ol' 

the  A  ,  is  more  interesting.  Difficulties  soon  arise  because  of 

r,  k 

the  large  number  of  terms  in  the  sum  S  ,  when  k  is  large  (in 

r,k 

fact  when  k  is  not  very  small) .  The  A^  ^  were  computed  by  a 

straightforward  method  for  r  <  18  ,  k  <  r  ,  and  also  for  r  =  19  , 

20  ,  21  ,  k  <  8  .  Sec  Table  1.  An  interesting  combinatorial  problem, 

which  we  shall  not  discuss  here,  is  the  computation  of  the  function 

u(r)  =  max(k<r[A  ^0}  . 

—  r,  K 

6 

Eleven  blocks,  each  of  about  8.10  numbers  and  in  the  region 

from  6.10^  to  2,10^,  were  searched  for  primes,  and  for  each  block 

the  actual  distribution  of  gaps  was  found.  Taking  for  N  the  midpoint 

of  the  block  (this  is  not  critical),  the  probabilities  l’(l)  , 

21 

f(2),...,  f(2l)  and  1  -  f(r)  were  calculated  from  the  A  and 

Conjecture  1.  The  'predicted  distribution’  was  just  these  probabilities 

multiplied  by  the  total  observed  number  of  gaps  (so  one  degree  of 

freedom  is  lost),  and  the  predicted  and  actual  distributions  were 

2 

compared.  In  no  case  did  the  X  test  indicate  a  significant  difference 

at  the  5$  (or  even  at  the  1C$)  level.  Generally,  the  fit  seemed  slightly 

better  than  chance,  which  is  perhaps  reasonable  on  intuitive  grounds, 

2 

but  in  only  three  of  the  eleven  cases  was  x21  significantly  small 
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1 

I 

\ 

l 

1 

t 

l 


•"***>!*,'&  W, 


at  the  Jfo  level.  The  intervals,  number  of  primes  in  them,  x21  for 

p 

21  degrees  of  freedom,  and  probability  of  such  x2^  being  exceeded  in 
sampling  from  identical  populations,  are  shown  in  Table  2. 

The  method  of  searching  for  primes  was  a  sieve  method  similar  to 

that  described  in  [3  ] .  Primes  up  to  the  square  root  of  the  largest 

number  to  be  tested  are  first  found  by  some  method,  and  then  blocks 

of  numbers  are  ’sieved'.  Only  odd  numbers  are  considered,  and  only  a 

one  bit  flag  for  each  number  is  necessary.  Actually  it  is  quicker  to 

use  the  smallest  addressable  unit.  The  blocks  should  be  as  large  as 

possible.  On  a  CDC  3200  with  15  bit  index  registers  (with  sign  extension 

to  17  bits  for  character  addressing)  and  l's  complement  arithmetic,  a 
15 

block  length  of  2  -1  can  be  used,  snd  the  innermost  loop  is  only 

three  instructions  with  one  storage  reference.  The  method  is  very 
fast  compared,  say,  to  the  ALGOL  procedures  [2],  Around  10  the 
time  to  search  a  million  numbers  and  output  the  roughly  60,000  primes 
to  tape  (for  possible  future  use)  was  20.1  seconds,  around  10^°  this 
increased  to  30.4  seconds.  The  program  was  checked  using  the  amazingly 
accurate  tables  [4],  and  all  computing  was  done  on  a  CDC  3200  at 
Monash  University. 

In  a  typical  case,  347570  primes  were  found  (in  243  sec.)  in  the 
interval  (1010  ,  1010+8,000,074)  .  The  distribution  of  gaps  is  shown 
in  Diagram  1,  and  Table  3  compares  the  actual  and  predicted  distributions. 
Note  the  approximate  equality  of  the  peaks  for  gaps  2  and  4,  the  high 
peak  for  6,  and  the  general  irregularity  of  the  distribution,  which 
are  typical  of  all  eleven  cases,  and  as  predicted  by  Conjecture  1. 
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Conclus ion 


Using  Conjecture  1  and  the  constants  A  in  Table  1,  the 

r,  k 

distribution  of  small  prime  gaps  predicted  was  in  good  agreement  with 
empirical  results  for  over  4,000,000  gaps.  As  the  distribution  is  so 
irregular,  which  can  be  seen  by  a  glance  at  Diagram  1,  it  is  hard  to 
believe  that  this  good  fit  is  just  a  coincidence.  Hence  any  results 
to  be  proved  concerning,  say,  twin  primes,  will  probably  have  to  be 
compatible  with  Conjecture  1,  or  at  least  with  Corollary  3. 
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Table  1 


k=l  2 

3 

r=l 

1.3203 

2 

1.3203  0 

3 

2.640 6  -5.7165 

0 

4 

1.3203  -5.7165 

4.1512 

5 

1.7604  -8.5747 

8.3023 

6 

2.6406  -20.008 

41. 512 

7 

1.5844  -14.291 

38.744 

8 

1-3203  -14.291 

49.814 

9 

2.6406  -32.870 

138.37 

10 

1.7604  -27.868 

160. 51 

ll 

1.4670  -22.509 

124.93 

12 

2.6406  -48.590 

343. 56 

13 

1.4404  -29.869 

243.58 

14 

1.5844  -33.048 

270.4 2 

15 

3.5200  -86.248 

855.67 

16 

I.3203  -36.300 

413.65 

17 

1.4o83  -39-046 

448.96 

18 

2.6406  -79.332 

1000. 5 

19 

1.3980  -44.642 

600.71 

20 

1.7604  -58.135 

815.12 

:i 

3.1088  -115.20 

1803.0 

C? 

1.4670  -56.513 

946.23 

4 

5 

6 

7 

9 

10 

0 

0 

0 

-20.264 

0 

0 

-30.395 

0 

0 

0 

-6O.79O 

17.298 

0 

0 

-222.90 

103.79 

0 

0 

-405.27 

415.16 

-107.94 

0 

-295.51 

249.10 

0 

0 

-1161.8 

1868.2 

-1133.4 

0 

-989.75 

2087.7 

-2140.9 

831.88 

-1097.6 

2290.3 

-2266.8 

792.27 

-4408.2 

12542. 

-19272. 

14320. 

0 

0 

-2532.1 

9022.4 

-18953. 

22649. 

3409.4 

0 

-2771. 5 

9927.3 

-20794. 

24340. 

3117.2 

0 

-6889.0 

28204. 

-70154. 

104110 

36919. 

-6124.0 

-4425.9 

19396. 

-51300. 

78886. 

-6321.4 

29583. 

-85427. 

149070 

-15890. 

86545. 

-300640 

662380 

-9036.7 

54285. 

-213220) 

8 
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0 

0 

0 

0 

0 

0 

0 

-3780.3 

0 

-13945. 

0 

-14176 . 
0 

-87178. 

0 

-63103. 


-146590 

-888200 


The  constants  A^.  The  last  digit  may  be  in  error  by  2  or  3,  especially  for 
hJ f'her  k.  Values  which  are  omitted  are  zero  for  r  <  18  . 


Table  2 


log10  N 

a 

b 

7.00 

6.106 

8,000,034 

7.38 

2.107 

8,000,098 

7.81 

6.107 

8,000,040 

8.31 

2.108 

8,000,022 

8.78 

6.^8 

8,000,078 

9.00 

1.109 

8,000,198 

9-30 

2.109 

8,000,000 

9.78 

6.109 

8,000,004 

10.00 

1.1010 

8,000,074 

10.18 

15. 109 

8,000,000 

10.30 

2.1010 

8,000,000 

n  +  1 

X2 

X21 

p(x2  >  x2x) 

497230 

15.28 

0.81 

470830 

14.08 

O.87 

445230 

15-55 

0.79 

418280 

18.79 

0.60 

395930 

8.73 

0.991 

386000 

21.69 

0.42 

374240 

27.03 

0.17 

355150 

9.20 

O.987 

347570 

15.54 

0.79 

341390 

19.36 

0.56 

337310 

10.99 

O.96 

Empirical  results  for  distribution  of  prime  gaps.  The  interval  searched 
is  (a,  a+b)  with  midpoint  N,  number  of  primes  in  interval  is  n+1  (so  n  gaps). 
Testing  the  fit  of  actual  and  predicted  distribution  of  gaps  of  length 
2,  4,  . ..,  42  and  remainder  gives  X^  with  21  degrees  of  freedom. 
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E  <jfiiWn - - —  — 


Table  3 


r 

f 

0 

f 

e 

1 

19945 

19950 

+0.09 

2 

19977 

19950 

+0.31' 

5 

56145 

36112 

+  0.17 

4 

16525 

16300 

+0.19 

5 

21054 

21188 

-0.92 

6 

28009 

27900 

+0.65 

7 

15785  . 

15613 

+  1.36 

8 

11975 

11905 

+0.62 

9 

21956 

21981 

-0.18 

10 

12403 

12395 

+0.07 

11 

10510 

10595 

-0.81 

12 

16435 

16449 

-0.11 

15 

7810 

7979 

+0.34 

14 

8896 

8710 

+1.99 

15 

15957 

16147 

-1.50 

l6 

5249 

5222 

+0.38 

17 

5535 

5504 

+0.38 

18 

9200 

9185 

+0.18 

19 

4428 

4597 

+0.47 

20 

5215 

5257 

-0.58 

21 

8035 

8007 

+0.29 

22,... ,150 

46735 

46867 

-0.61 

Distribution  of  the  347,569  prime  gaps 

in  the  interval  (10^, 

10,008,000,074). 

For  a  gap  of  length  2r  the  actual  frequency  is  f  and  the  predicted  frequency 

2 

f  (with  equal  totals).  The  X  test  gives  P  =  0.79>  so  does  not  show  a  sig¬ 
nificant  difference  between  the  two  distributions. 
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Gap  Length 


The  frequency  of  occurrence  of  small  prime  gaps  in  the  interval 

(10,000,000,000,  10,008,000,074). 
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